This study relies on two main datasets. One corresponds to Spotters’ reports extracted from the historic Storm Event Database from NOAA, which is available online. The second corresponds to geotagged Tweets collected and provided by the DOLLY project at the University of Kentucky. Tweets contained in this dataset were sent from within the United States between October 24 and October 31 of 2012 with explicit geographic information as latitude and longitude coordinates.
Spatial information representing state boundaries was downloaded in shapefile format from the United States Census Bureau website. Polygons representing state boundaries by 2017 are used in this study.
Based on this spatial dataset, a polygon representing the states affected by Hurricane Sandy was created. According to the SHELDUS database, the following states were affected by the hurricane: Maryland, Delaware, New Jersey, New York, Connecticut, Massachusetts, Rhode Island, North Carolina, Virginia, West Virginia, Ohio, Pennsylvania, New Hampshire and District of Columbia.
Based on this information, NWS reports and Tweets sent from these states were selected. We also filtered only Tweets and reports sent within the two days with more impact reported (Oct/29/2012 and Oct/30/2012), which also were the days with reports from both data sources. After filtering, the NWS datasets contains only 115 reports and the Twitter dataset contains 74807 tweets.
The purpose of this analysis is to estimate the correspondence between the spatial distribution of Twitter reports and that of reports collected in a traditional fashion by the NWS. Grids of hexagons at a range of sizes are used to compare the overall variance in the density of reports across cells. That density is estimated for each data set as follows:
\(\frac{Number\ of\ Reports\ by\ Hexagon}{Total\ Number\ of\ Reports}\)
\(\frac{Number\ of\ Tweets\ by\ Hexagon}{Total\ Number\ of\ Tweets}\)
In order to vary the cellsize or hexagons size, 32 hexagonal grids covering the area of study with the following different numbers of hexagons were created: 100, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10000, 10500, 11000, 11500, 12000, 12500, 13000, 13500, 14000, 14500, 15000, 15500
In order to count reports per hexagon, reports information (tweets and NWS reports) was added (joined) to the hexagons.
Some of the map comparisons between total of tweets and density of tweets at different cell number levels were created and presented as follows: